This document is the Technical Appendix to the paper “Cities, Land and Space: A History of ‘Urban Economics’ as a Field”. The article is written by Beatrice Cherrier and Anthony Rebours.For questions and requests concerning the part of the project contained in this appendix please email Anthony Rebours. We also thank Julie Sixou and Aurélien Goutsmedt for their help and useful remarks.
In the article we used co-citation and network visualization to map the scientific field of urban economics and its transformations since the early 1960s. Co-citation is a technique that links references that are cited together by a corpus. This provides a mapping of communities working on related topics or with related methods at a given point of time. It was pioneered by Small 1973 and Marshakova 1973 (see also Gmür 2003 and Boyack and Klavans 2010 for a review). It was previously used by Yves Gingras to tracks the transformation of gene research (Réale et al. 2020) or of the shifts in the inner structure of physics as a discipline (Gingras 2007; Khelfaoui and Gingras 2019).1
First, we present the database that we have built from Web of Science as well as the process whereby we matched it with data from Scopus. Second, using these databases, we produce a series of co-citation networks with the Biblionetwork R package and the open-source software Gephi.
As we explain in the introduction to our article, our seeds are the three founding texts of the AMM model – Location and Land Use by Alonso (1964), the article “An Aggregate model of resource allocation in a metropolitan area” by Mills (1967) and Cities and Housing by Muth (1969). These texts are among the most influential texts in urban economics. For instance, Table 1 shows that Muth (1969), Alonso (1964) and Mills (1967) are the 1st, 2nd and 7th most cited documents by all the articles published in the Journal of Urban Economics (JUE) between 1974 and July 2019, as reference by the Clarivate Analytics’ Web of Science (WoS) databases.2 As we can see, among the 2 132 articles that were published by JUE between 1974 and 2019, Muth (1969) have been cited by 219 articles published in the JUE, Alonso (1964) by 141 and Mills (1967) by 90.
| Authors | Cited Documents | Year of publication | Total of citations |
|---|---|---|---|
| Muth-RF | Cities And Housing | 1969 | 219 |
| Alonso-W | Location And Land Use: Toward A Theory Of Land Rent | 1964 | 141 |
| Tiebout-CM | Journal Of Political Economy | 1956 | 130 |
| Mills-ES | Urban Economics | 1972 | 129 |
| Rosen-S | Journal Of Political Economy | 1974 | 125 |
| Mills-ES | Studies In The Structures Of The Urban Economy | 1972 | 102 |
| Mills-ES | American Economic Review | 1967 | 90 |
| Roback-J | Journal Of Political Economy | 1982 | 75 |
| Krugman-PR | Journal Of Political Economy | 1991 | 73 |
| Marshall-A | Principles Of Economics | 1920 | 67 |
| Rosenthal-SS | Handbook Of Regional And Urban Economics | 2004 | 66 |
| Henderson-JV | Economic Theory And The Cities | 1977 | 62 |
| Glaeser-EL | Journal Of Political Economy | 1992 | 60 |
| Henderson-JV | American Economic Review | 1974 | 59 |
| Wheaton-WC | Journal Of Economic Theory | 1974 | 59 |
| Kain-JF | Quarterly Journal Of Economics | 1968 | 58 |
| Oates-WE | Journal Of Political Economy | 1969 | 56 |
| Duranton-G | Handbook Of Regional And Urban Economics | 2004 | 52 |
| Fujita-M | Urban Ec Theory Land | 1989 | 50 |
| Zodrow-GR | Journal Of Urban Economics | 1986 | 50 |
Table 1. Top 20 cited documents (books and articles) by all the articles published in the Journal of Urban Economics between 1974 and 2019 (WoS)
To build our corpus, we first identified all the papers in all
the journals and for all the periods covered by the WoS that cite at
least one of the three founding texts of AMM. We accessed the WoS
databases via an interface built by the researchers of the Observatoire des sciences et
technologie based at the CIRST, Montreal, through a series
of custom-made SQL queries.3 These queries allowed us to gather
2 939 articles published between 1964 and 2019 in different journals
from different disciplines. In a second step, we gathered all the
references cited alongside Alonso (1964), Muth (1969) and/or Mills
(1967) in those articles. Because this raw data requires time-consuming
hand manipulation to be cleaned, we implemented this second step on a
selected range of periods that will allow us to compare the
transformation of the field across time.
Time windows selection: In the bibliometric literature it is generally advised to use time windows between five and ten years to analyze citations or social sciences. This is because the median age of cited literature in these disciplines is about five years and their citations peak around ten years after their publication (Archambault and Larivière 2010). For that reason, we choose to focus on four time windows of five years each in our study: 1975-1979, 1985-1989, 2000-2004 and 2005-2009. This choice has also been informed by our qualitative research. The 1975-1979 window captures the effects of the large influx of money that urban research received in the wake of the urban riots of 1965-1967 on publications. The 1985-1989 window capture the state of the field after this strong financial support had dried up, resulting in the disinterest of star economists, and before Paul Krugman published his work on geographical economics. The final window captures the state of the field after the influence of the geographical economics approach had stabilized.Table 2 shows the number of papers that cite at least of the three texts of AMM during each of our time windows and the number of references they contain.
| Time windows | Nbr of references | Nbr of articles | References per articles |
|---|---|---|---|
| 1975-1979 | 8626 | 318 | 27.12579 |
| 1985-1989 | 8176 | 219 | 37.33333 |
| 2000-2004 | 10073 | 234 | 43.04701 |
| 2005-2009 | 13277 | 291 | 45.62543 |
Table 2. Articles and references in each time window
Author cleaning: Most co-citation studies present networks where the nodes connected through edges are individual references, such as “Kain 1965.” What we are interested in, rather, is to map the transformation of a community of researchers interested in urban economics, and how frequently these researchers are cited together, thus identified as intellectually close. Our main methodological challenge was thus to move from author/papers data entries to author ones. As a first step, we:
excluded references which author’s name is that of an institution (for instance, documents whose author is the US Bureau of Census)
excluded references only cited once in our database, which can be considered as references only weekly or accidentally associated with urban economics
excluded citations to publications by Alonso, Muth and Mills themselves. Being the seeds, the three authors are by construction at the center of the network. Their prominence is not new information and hinders other interesting features of our networks
Table 3 shows the remaining references after these modifications.
| Time windows | Nbr of references | Nbr of articles | References per articles |
|---|---|---|---|
| 1975-1979 | 3537 | 314 | 11.26433 |
| 1985-1989 | 3367 | 219 | 15.37443 |
| 2000-2004 | 6589 | 234 | 28.15812 |
| 2005-2009 | 6116 | 289 | 21.16263 |
Table 3. Articles and references in each time windows for cleaned data
The WoS is not exhaustive, but its gaps and biases are well identified and assessed (Archambault et al. 2009). For instance, one might complain that only journal articles are indexed in the database. However, this limit does not represent a major issue as, even if in social sciences the weight of research articles is less important than in fields like physics, specially before the 1990s (Larivière et al. 2006), articles published in scholarly journals represents, even at the time, the vehicles par excellence for the circulation of knowledge validated by the scientific profession. Moreover, there is no reason to think that citations from books would be different than that from articles. In fact, Yves Gingras and Mahdi Khelfaoui (2019) analyzed citation patterns from different social and humanity fields to show that there were no significant differences between rankings of authors whether they are cited in journals, books and book chapters. Moreover, it must be noted that the papers contained in the journals indexed in the WoS cite thousands of different publication mediums, including books and book chapters, and not only the major journals which are represented in the database.
Another more important limitation of the WoS is that it only lists the first author of the references. If we want to build a co-citation network linking the authors of the reference, we need a way to collect information about all the authors of the references. To address this issue, we compared the references in the corpus that we extracted from the WoS with references from Scopus. The latter database contains information not only about first author but also about all the co-authors of references indexed.4 To match the references from our corpus to that of Scopus databases, we look for references with same first author, publication name, year of publication and, if it exists, the same volume.
Unfortunately, not all the references we had were covered in Scopus (codes to query Scopus’ API and results are provided in our replication package):
Given that all missing references needed to be hand-searched, we restricted our corpus to the work being cited at least 5 times in each time window. Tables below show the total number of references we finally kept for our networks.
| Citations received | Number of cited documents | Total |
|---|---|---|
| 5 | 22 | 110 |
| 6 | 15 | 90 |
| 7 | 9 | 63 |
| 8 | 9 | 72 |
| 9 | 3 | 27 |
| 10 | 8 | 80 |
| 11 | 2 | 22 |
| 12 | 6 | 72 |
| 13 | 3 | 39 |
| 14 | 1 | 14 |
| 15 | 1 | 15 |
| 17 | 1 | 17 |
| 19 | 1 | 19 |
| 21 | 1 | 21 |
| 23 | 1 | 23 |
| 38 | 1 | 38 |
| Total | 84 | 722 |
| Citations received | Number of cited documents | Total |
|---|---|---|
| 5 | 15 | 75 |
| 6 | 18 | 108 |
| 7 | 11 | 77 |
| 8 | 7 | 56 |
| 10 | 5 | 50 |
| 11 | 5 | 55 |
| 12 | 2 | 24 |
| 13 | 2 | 26 |
| 15 | 1 | 15 |
| 21 | 2 | 42 |
| Total | 68 | 528 |
| Citations received | Number of cited documents | Total |
|---|---|---|
| 5 | 48 | 240 |
| 6 | 26 | 156 |
| 7 | 16 | 112 |
| 8 | 11 | 88 |
| 9 | 11 | 99 |
| 10 | 8 | 80 |
| 11 | 7 | 77 |
| 12 | 4 | 48 |
| 13 | 6 | 78 |
| 15 | 2 | 30 |
| 16 | 1 | 16 |
| 17 | 1 | 17 |
| 18 | 1 | 18 |
| 21 | 1 | 21 |
| 23 | 1 | 23 |
| 24 | 1 | 24 |
| 26 | 2 | 52 |
| 28 | 1 | 28 |
| 29 | 1 | 29 |
| Total | 149 | 1236 |
| Citations received | Number of cited documents | Total |
|---|---|---|
| 5 | 68 | 340 |
| 6 | 33 | 198 |
| 7 | 25 | 175 |
| 8 | 19 | 152 |
| 9 | 8 | 72 |
| 10 | 11 | 110 |
| 11 | 12 | 132 |
| 12 | 2 | 24 |
| 14 | 3 | 42 |
| 15 | 1 | 15 |
| 16 | 1 | 16 |
| 18 | 2 | 36 |
| 19 | 1 | 19 |
| 20 | 1 | 20 |
| 21 | 1 | 21 |
| 22 | 1 | 22 |
| 27 | 1 | 27 |
| 29 | 1 | 29 |
| 35 | 1 | 35 |
| Total | 192 | 1485 |
Our networks are based on co-citation technique. In bibliometric and network analysis, co-citation and bibliographic coupling are the two main techniques used to analyze how scientific works relate to others, none being originally developed for history purpose. Co-citation technique links cited documents (articles, books, book chapters) depending of the number of times they appear in references of citing articles. If two documents are co-cited (which means they are referenced in a same citing document), there is an link between the two. The weight of the link depends on the number of citing documents that “co-cite” these two nodes. Alternatively, in bibliographic coupling citing documents are linked according to the number of references they share in their respective bibliographies. The more they share references, the stronger the link between them. Both techniques highlights how a discipline structures itself at a given moment of time. However, we chose co-citation analysis because it allows us to capture renewed interest to old references, and how those can serve as reference base for different communities across time. For instance, Christaller may be cited often together with Isard in the 1960s, then disappear from bibliographies, then become cited anew in the 1990s alongside Krugman’s work. It provides fewer information on the structuration of a discipline at time t, but it allows to study how old communities around given references, topics or methods are being recomposed with time.5
Since our focus is on urban economists, not just the work produced, we applied the co-citation method in a way that makes the nodes cited authors. To do so, we grouped all cited documents associated with an author’s name together in a given time window (White and Griffith 1981). The links thus represent how often two authors are cited together during this time window, which we interpret as a sign of intellectual proximity (from the perspective of the researcher who cite them together). Focusing on authors required us to recover all authors of each cited reference, but also to separate co-authors in our database. To do so, we followed Dangzhi Zhao (2006) and treat co-authorship as co-citation.6 An alternative to this “inclusive all-author co-citation analysis” approach would have been to choose an “exclusive all-author co-citation analysis” approach by excluding co-authorship as co-citation. A problem by doing this is that if an author in our corpus is only present as second author of only one document he would appear in our networks as being connected with all the rest of the authors with whom he is co-cited except his co-author. Treating co-authorship as co-citation is also consistent with considering co-authorship as another sign of cognitive proximity.7
Another aspect we had to consider is the fact that our corpus contains references in articles published by journals from different disciplines and different periods. The main problem here being that citation practices vary over time and across scientific fields (Gingras and Larivière 2014). In particular, documents with more references are likely to have more co-cited documents, which means that disciplines or journals in which practice is to cite more references might inflate the importance of certain authors in the networks. Similarly, authors who self-cite a lot or who are cited multiple times in a given paper might also have more importance than they should. However, what we are interested in is the systematic character of the relationship between each author and the community interested in urban economics topics rather than the amount of citations in a limited set of work. As is common in most social-network analysis, we addressed this issue by using Salton’s cosine measure to normalize citations in our corpus (van Eck and Waltman 2009). Salton’s technique divides the number of times two references are co-cited in articles by the product of the square roots of the articles’ respective total number of references (Sen and Gan 1983). To produce the edge lists for our co-citation networks with the normalized values of the different co-cited authors, we used the biblionetwork R package designed by Goutsmedt, Claveau and Truc (2021). We produced an edge list for each time windows. Here is the code we used to generate them:
# Packages
library(here) # to create path to save data in repository
library(rio) # to import-export datatables
library(biblionetwork) # to create co-citation edge lists
# Input Data
data7579 <- import(here("data", "cocitation_inputs", "input7579.csv"))
data8589 <- import(here("data", "cocitation_inputs", "input8589.csv"))
data0004 <- import(here("data", "cocitation_inputs", "input0004.csv"))
data0509 <- import(here("data", "cocitation_inputs", "input0509.csv"))
# Cocitation tables
cocitation7579 <- data7579 %>%
biblio_cocitation(source = "ID_Article_citant",ref = "value") %>%
select(-c('from','to'))
cocitation8589 <- data8589 %>%
biblio_cocitation(source = "ID_Article_citant",ref = "value") %>%
select(-c('from','to'))
cocitation0004 <- data0004 %>%
biblio_cocitation(source = "Citing_ID",ref = "Name_Aut") %>%
select(-c('from','to'))
cocitation0509 <- data0509 %>%
biblio_cocitation(source = "ID_Article_citant",ref = "value") %>%
select(-c('from', 'to'))
# Export
cocitation7579 %>%
export(here("data", "cocitations", "cocitations7579.csv",
sep=";",
row.names = FALSE))
cocitation8589 %>%
export(here("data", "cocitations", "cocitations7579.csv",
sep=";",
row.names = FALSE))
cocitation0004 %>%
export(here("data", "cocitations", "cocitations7579.csv",
sep=";",
row.names = FALSE))
cocitation0509 %>%
export(here("data", "cocitations", "cocitations7579.csv",
sep=";",
row.names = FALSE))
Once we had the edge list for each time windows, we input them into the open-source software Gephi to generate our co-citation networks. Gephi uses the Force Atlas 2 algorithm (Jacomy et al. 2014) which calculate the location of each node (author) in the network according to the intensity of the edges (co-citations) it has with the other nodes in the network. The more two co-cited authors are, the thicker the edge between them and the closer they will appear on the network map. In addition, we also used Gephi option to calculate the weigthed centrality degree of all the authors present in the networks. This quantitative indicator measures the number of weighted links an author has with all other scientists (Freeman 1978/1979). The fact that links are weighted means that it takes into account the number of times two authors are co-cited and not only the fact that they are co-cited as with the simple centrality measure. In our co-citation networks this is reflected by the sizes of the nodes.
To visualize better the communities resulting from our co-citation analysis, we applied the Leiden detection Algorithm designed by Traag, Waltman and van Eck (2019).8 The algorithm maximizes modularity, that is the quality of a particular division in a network (Newman and Girvan 2004). In other words, it identifies groups of authors that have significantly stronger citation links with each other compared to the links they have with those outside of the group. Our basic assumption here is that scholars sharing the same interests in domains or methods should cluster into a group of cohesive authors (“communities”). For each periods we used the same resolution of 1 with 1 000 iterations (it makes sure that we have the same clusters each time we input our edge lists in Gephi). Communities are represented in different colors in the networks presented in the Annex below. The resulting networks are represented in figures 1 to 4 below.
Finally we applied a general threshold on edges. As it is clear from these figures, they are still too dense to be properly interpreted, so that we also propose a network for each time window where all the nodes but only the most salient co-citation edges are represented. To do so, we impose a co-citation threshold under which our edges are not represented, which varies with the size of each network. We choose each threshold so that it alters the structure of the networks as little as possible, since modularity maximization is known to be sensitive to threshold effects (Fortunato and Barthélémy 2007). To avoid introducing potential and artificial changes in community structures, we applied the Leiden Algorithm before introducing thresholds and we used the same threshold for each of the four networks by keeping only links with an intensity superior to 0.2. Figure 1 to 4 show the impact of these choices before (at the bottom of the figures) and after applying thresholds (at the top of the figures).
Archambault E., Campbell D., Gingras Y. and Larivière V., 2009, “Comparing Bibliometric Statistics Obtained From the Web of Science and Scopus”, Journal of the American Society for Information Science and Technology, 60, 7: 1320-1326
Archambault E. and Larivière V., 2010, “The Limits of Bibliometrics for the Analysis of the Social Sciences and Humanities Literature”, in Bokova I., Sané P. and Hernes G. (eds) The World Social Science Report: Knowledge Divides, Paris: Unesco Publishing: 251-254
Alonso W., 1964, Location and Land Use, Cambridge: Harvard University Press
Doehne M. and Herfeld C., 2018, “The Diffusion Of Scientific Innovations: A Role Typology”, Studies in History and Philosophy of Science, xxx: 1-18
Fortunato S. and Barthélémy M., 2007, “Resolution Limit in Community Detection”, Proceedings of the National Academy of Sciences, 104, 1: 36-41
Freeman L.C., 1978/1979, “Centrality in Social Networks: Conceptual Clarification”, social networks, 1: 215-239
Gingras Y. and Khelfaoui M., 2019, “Do We Need a Book Citation Index for Research Evaluation?”, Research Evaluation: 1-11
Gingras Y. and Larivière V., 2014, “Measuring Interdisciplinarity” in Cronin B. and Sugimoto C.R. (eds) Beyond Bibliometrics: Harnessing Multimensional Indicators of Scholarly Impact, Cambridge, London: The MIT Press
Gmür M., 2003, “Co-citation Analysis and the Search of Invisible Colleges: A Methodological Evaluation”, Scientometrics, 57, 1: 27-57
Goustmedt A., 2021, “From the Stagflation to the Great Inflation: Explaining the US Economy of the 1970s.”, Revue d’economie politique, 131, 3: 557-582
Goutsmedt A., Claveau F. and Truc A., 2021, “Biblionetwork: A Package For Creating Different Types of Bibliometric Networks”, R Package version 0.0.0.9000 https://github.com/agoutsmedt/biblionetwork
Larivière V., Archambault E., Gingras Y. and Vignola-Gagné E., 2006, “The Place of Serials in Referencing Practices: Comparing Natural Sciences and Engineering With Social Sciences and Humanities”, Journal of the American Society for Information Science and Technology, 57, 8: 997-1004
Marshakova-Sahikevich I, 1973. “System of document connections based on references” Nauch-Techn.Inform, Ser.2 (6):3-8
Mills E.S., 1967, “An Aggregative Model of Resource Allocation in a Metropolitan Area”, American Economic Review, 57, 2: 197-210
Muth R.F., 1969, Cities and Housing, Chicago: University of Chicago Press
Newman M.E.J. and Girvan M., 2004, “Finding and Evaluating Community Structure in Networks”, Physical Review E, 69, 026113
Réale D., Khelfaoui M., Montiglio P.-O. and Gingras Y., 2020, “Mapping the Dynamics of Research Networks in Ecology and Evolution Using Co‑Citation Analysis (1975–2014)”, Scientometrics,https://doi.org/10.1007/s11192-019-03340-4
Sen S.K. and Gan S.K., 1983, “A Mathematical Extension of the Idea of Bibliographic Coupling and Its Applications”, Annals of Library Science and Documentation, 30, 2:78-82
Small H.G., 1973, “Co-citation in the Scientific Literature: A New Measure of the Relationship Between Two Documents”, Journal of the American Society for Information Science, 24, 4: 265-269
Traag V.A., Waltman L. and van Eck N.J., 2019, “From Louvain to Leiden: Guaranteeing Well-connected Communities”, Scientific Report, 9, 5233
van Eck N.J. and Waltman L., 2009, “How to Normalize Coocurrence Data? An Analysis of Some Well-Known Similarity Measures”, Journal of the American Society for Information Science and Technology, 60, 8: 1635-1651
White H.D. and Griffith B.C., 1981, “Author Co-citation: A Literature Measure of Intellectual Structure”, Journal of the American Society for Information Science, 32: 163-171
Zhao, Dangzhi, 2006, “Towards all-author co-citation analysis”, Information Processing and Management, 42: 1578-1591
See also Doehne and Herfled (2018) on the diffusion of the theory of rational decision-making, Gingras and Schinckus (2012) on the institutionalization of Econophysics.↩︎
Given that at the moment the data were collected not all publications from JUE were yet indexed in WoS, the year 2019 covered only 12 papers of JUE in our set of data.↩︎
To identify references to Alonso (1964), for instance, we searched all the articles indexed the WoS that contain ‘Alonso-W’ as first author and expressions similar to ‘Location% land%’ as documents’ name. Now considering Muth (1969) and Mills (1967), given that both of them have multiple first names, we looked at each combination of names and first names possible. That is, for Muth (1969), all articles that cite references with any combinations between ‘Muth-R’ or ‘Muth-RF’ and ‘Citi%’ were included in our corpus. For Mills (1967), all the articles containing references to any combinations between ‘Mills-E’ or ‘Mills-ES’ and ‘A%E%R%’ and ‘1967’ as the publication year, were kept. For journal articles like Mills (1967), only the name of the journal is available in the WoS to identify the reference. In the case of Mills (1967), we thus look at the references with ‘American Economic Review’ as the document’s name. The reason why we used the expression ‘A%E%R%’ in our query is that references in the WoS can have the expressions ‘Am Economic Rev’ or ‘Am Ec Rev’ for instance as part of their document name. To search for similar expression to ‘A%E%R%’ allow us to capture any possibilities. Moreover, Mills may have published several papers in the American Economic Review so we also add the publication year (1967 in that case) of the paper that we are looking for.↩︎
Nevertheless, it seems that WoS databases remain more reliable as the items included in Scopus are less controlled in terms of scientific norms than in the WoS which considers a list of criteria whereby an article is deemed scientific. This list includes aspects such as international recognition, peer-review process, scientific editorial norms, etc. (Lariviere and Sugimoto 2018, 40).↩︎
See Goustmedt 2021 for an example of a comparison between both methods applied to a corpus in macroeconomics, as well as Boyack and Klanvans 2010 for a general comparison.↩︎
Zhao (2006, 1580) defines an author’s body of work “as all works with this author as one of the authors of each of these works.”↩︎
Co-authorship and co-citation remain conceptually different types of proximity as co-authorship results from the choices of those who are cited contrary to co-citations which is the result of the choices of the citing authors.↩︎
For the Louvain Algorithm we used settings with a resolution of 1.0 for each time windows, the number of communities being determined by the algorithm.↩︎